Insertion sequence transposition inactivates CRISPR-Cas immunity

CRISPR-Cas immunity systems safeguard prokaryotic genomes by inhibiting the invasion of mobile genetic elements. Here, we screened prokaryotic genomic sequences and identified multiple natural transpositions of insertion sequences (ISs) into cas genes, thus inactivating CRISPR-Cas defenses. We then generated an IS-trapping system, using Escherichia coli strains with various ISs and an inducible cas nuclease, to monitor IS insertions into cas genes following the induction of double-strand DNA breakage as a physiological host stress. We identified multiple events mediated by different ISs, especially IS1 and IS10, displaying substantial relaxed target specificity. IS transposition into cas was maintained in the presence of DNA repair machinery, and transposition into other host defense systems was also detected. Our findings highlight the potential of ISs to counter CRISPR activity, thus increasing bacterial susceptibility to foreign DNA invasion.

The authors identify a CRISPR system carrying an IS10 insertion in Erwinia amylovora. They then express this system heterologously in an E. coli host, along with an additional plasmid that carries protospacers targeted by the CRISPR system. They demonstrate that such targeting is deleterious in the presence of CRISPR, but that an inactivated CRISPR system removes this cost. Such inactivation can also lead to a positive interaction between host and plasmid when the plasmid carries antibiotic resistance markers and antibiotic selection is applied. This element of the paper is convincing but perhaps somewhat unsurprising.
They then design an assay to identify IS insertions into cas genes in a high throughout way, by adding a self-targeting guide coupled with an inducible Cas9 system. This allows them to probe what determines insertion into cas genes and identify the relative frequencies of different IS elements.
Lastly, they use a bioinformatic screen to assess the distributions of IS elements across taxa and within specific anti-MGE defences, such as RM etc.
The experiments are comprehensive and the bioinformatic analysis seems valuable for describing the distributions of various IS elements. My main concerns are that this may not be of particular relevance to anyone other than those studying IS elements and that the narrative of the paper is not clear. I am also not really persuaded that the evidence presented demonstrates that IS elements are 'activated' (line 197) to insert into cas genes (versus selection after a random event).
Some of the results are very interesting, such as the observation that IS insertion into cas genes much more common than SNPs and indels-potentially speeding up the evolutionary response during selection. I'm also not very familiar with the IS literature, and some of these observations may be quite novel i.e. which IS elements are most prevalent in E.coli, and which have the ability to tolerate a range of insertion sites / motifs. Perhaps with a clearer narrative this paper would appeal to a wider audience.
In it's present form I found it hard to follow the narrative and rationale for some of the experiments, and it was unclear to me why this would be of interest to those beyond the immediate IS field.

Minor:
Line 37-is CRISPR really more dominant than RM systems? Probably not given prevalence of both systems.
Line 68: "we hypothesized that the collapse of CRISPR machinery might be a fitness cost of the host during stress survival". This is unclear to me. Do the authors mean that the loss of CRISPR is the fitness cost-which is possible if susceptibility to parasitic MGEs increases susceptibility to phages, for example? I could see a scenario where, under stress, the host needs to adapt, and susceptibility to incoming plasmids could be highly beneficial if the confer advantageous traits. Some clarity around this would help.
Line 121: SYH01 and SYH02. Is the only difference between these hosts the vector (pEraCas or pEraCas-IS10)? If so, why not simply refer to the plasmid names?

Reviewer #2 (Remarks to the Author):
The authors have adequately addressed my comments in their revised manuscript.
Response: We sincerely appreciate the reviewer's kind comments and constructive suggestions.
Reviewer #3 (Remarks to the Author): I find the study convincing that ISs can mediate the trade-off between CRISPR immunity and the acquisition of beneficial MGEs. This element of the work is quite thorough and the majority of my concerns have been addressed adequately.
Response: We appreciate the reviewer's high evaluation of the improvement of our manuscript during the revision.
I still have one issue in regards to the occurrence of IS transpositions into cas genes relative to other genes, or put another way how intentional or specific these insertions are. The suggested analysis in the rebuttal that compares IS transpositions into cas genes vs. CRISPR arrays is inadequate because these genes are functionally linked and therefore not independent. For example, if an IS element inserts into a cas gene, then the CRISPR system becomes inert, meaning there will be no additional effect from subsequent insertions into the arrays. CRISPR arrays may also be more recombinogenic due to the repeat sequences, which might influence IS stability. Why not compare overall IS rate across the whole genome, or per kb?
With that said, this does not diminish the functional implications of insertions into cas genes. I agree with the overall conclusions that such ISs can balance the costs and benefits of MGEs. Adding this distinction to the discussion would probably be sufficient.
Response: We sincerely appreciate the reviewer's insightful comment regarding our choice to analyze ISs transpositions into cas genes alongside CRISPR arrays. We selected CRISPR arrays as a suitable comparative target, primarily because of their relatively similar occurrence frequencies to cas genes 4 . Our initial analysis uncovered a distinct predilection for IS transpositions, exclusively occurring within cas genes, while no IS insertions were detected within the CRISPR arrays. However, as mentioned by the reviewer, this analytical method may not be sufficiently thorough. Nevertheless, the exclusive preferential transpositions of ISs into cas genes and the absence of IS insertions within the CRISPR arrays indicate a significant disparity in statistical probability. This distinction, to a certain extent, implies that ISs transpositions may not be completely arbitrary.
To comprehensively evaluate the observed discrepancy, we endeavored to conduct a theoretical analysis, as recommended by the reviewer, comparing the IS transposition rate across the entire genome and normalizing it on a length scale. We therefore formulated a series of mathematical equations as follows, to quantify the discrepancy in IS transposition rates between the cas and non-cas genes on a length scale.
Equation (1) calculates the IS transposition rate in all cas genes, denoted as ! !"# .
Equation (2) determines the IS transposition rate in non-cas genes, denoted as ! *,-). . It is obtained by multiplying the number of IS insertions into non-cas genes ("#$ *,-). ) by the difference between the genome length and the sum of the lengths of all cas genes, divided by the length of the genome.
Equation (3) calculates the fold change in gene length, denoted as () /)$ . It is obtained by dividing the difference between the length of the genome and the sum of the lengths of all cas genes by the sum of the lengths of all cas genes.
Equation (4) determines the fold change in transposition rate, denoted as () 0 . It is obtained by dividing the IS transposition rate in non-cas genes by the IS transposition rate in all cas genes.
Based on the established formulas, we can gain insights into the relationship between sequence length, transposition rates and the presence of preferences or biases (how intentional or specific these insertions are) in IS transposition across the genome.
Specifically, when there is a substantial disparity between the values of () /)$ and () 0 , it suggests that sequence length does not primarily influence the rate of IS transposition across the entire genome, potentially implying that IS transposition may not be random but rather exhibit a certain preference (put simply, if IS transposition is not biased, the longer potential target sequences ought to possess a higher likelihood of IS transposition occurrence). Conversely, if there is minimal difference between () /)$ and () 0 values, it indicates that sequence length indeed predominantly impacts the over rate of IS transposition, potentially implying that IS transposition may lack a specific bias and instead exhibit a more random pattern.
However, we encountered certain constraints during the practical analysis when evaluating the transposition rates of ISs across the entire genome. Firstly, considering the abundance of necessary genetic elements within bacterial chromosomes, any detrimental transposition of ISs into essential genes would lead to microbial lethality 5 , indicating that ISs are not expected to transpose and disrupt essential genes. In light of this, the inadvertent influence of the sum of lengths of numerous essential genes on the actual () /)$ and () 0 values should be duly acknowledged. Undoubtedly, a straightforward and effective approach would entail extracting the entirety of essential genes from each genome, focusing solely on the analysis of non-essential genes.
Currently, the thorough characterization of essential genes still remains challenging 6 , thus impeding the accurate and efficient analysis of non-essential genes within each genome. Secondly, the existing ISs detection software is limited to determining whether ISs are present or not, falling short in accurately differentiating between ISs that result from transposition events and those that are inherently present. This limitation will also hamper the precise evaluation of the "#$ !"# and "#$ *,-). values in the equations (1) and (2), respectively, thereby impacting the overall analytical outcomes. In summary, our theoretical analysis reveals that the utilization of this approach, which entails the analysis of the transposition rates of ISs at the genomic level in order to infer their randomness or preference, may also exhibit inherent limitations. Nonetheless, we believe that with ongoing efforts to overcome these limitations in future research, this analytical approach holds great promise.